Stable Diffusion from Begginer to Master (2) Dreambooth
Deep Learning
Python
Published
December 11, 2022
This is the second one of the Stable Diffusion tutorial series. In this tutorial, we will learn how to fine-tune the stable diffusion model on new images (aka Dreambooth).
Setup
First let’s install all the dependencies we need to train the model. To avoid losing the model and the data, we will save them in the Google Drive. Make sure you have enough space left in your Google Drive before continuing.
Token is valid.
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.huggingface/token
Login successful
Preparing Images
Getting Images
For the purpose of this tutorial, let’s download some images from DuckDuckGo. Alternatively you can prepare your own images and put them inside a Google Drive Folder (referred to as images_src_dir in the next sections).
Although SD doesn’t put restrictions on image sizes (other than the width and height should be divisible by 8), we preform center crop and resize on all images to make them the same square shape, since the training batches need to be same dimensions.
Cropping might result in bad images, but no worries, we will clean them up in the next section.
One of the most important things of any ML applications, if not THE most important - is the data quality. To get best quality, let’s check our training images and remove the “bad” ones (especially the ones that doesn’t contain complete faces after cropping - we don’t want the final model to learn to generate half faces!)
FastAI provides an ImagesCleaner class which is a very cool tool for removing images from the Jupyter notebook. Just select “Delete” for the images you want to delete, and then run the following cells to delete them.
First let’s train with the diffusers example script.
There are a few params worth noting:
MODEL_NAME This is the id of your base model, e.g. SD1.5 or SD2.0. By default for training the SD2.0 with 768 resolution (“stabilityai/stable-diffusion-2”) you will need more than 16GB memory (about 22GB without xformers from my experiments).
STEPS The number of steps to train. Recommended value is ${num_examples} $.
SAVE_STEPS The model will be exported every X steps, to avoid losing all progress when your GPU is recycled, or enable comparing different checkpoints to see which is best.
OUTPUT_DIR This is where the trained model is exported.
INSTANCE_DIR This points to the training images.
INSTANCE_PROMPT This is the prompt for the training instances. In the diffusers example we are using a fixed prompt “a photo of xyz” for every instance image. This may not be optimal, and we’ll see how we can improve it later on.
Make sure you selected GPU runtime (preferrably the best GPU you can choose) before continuing.
Now we have the model, we can use the Pipeline API to make predictions. To facilitate experiments, I have created a Web UI, inspired by AUTOMATIC1111. The difference is that here we use the diffusers Pipeline, unlike the AUTOMATIC1111 web UI which uses the original stable diffusion codebase.
The model_id is the path to your trained model i.e. (OUTPUT_DIR in the last section). You can also use the standard model ids and compare.
output_dir is a directory where predicted images will be saved. If empty images will saved in the model_path / outputs directory.
I tried an experiemnts using the same prompts with random styles
Experiment Params:
Steps: 6000
Prompt
A portrait of saitoasuka wearing a baseball hat
Negative Prompts
disfigured, kitsch, ugly, oversaturated, greain, low-res, Deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, missing limb, blurry, floating limbs, disconnected limbs, malformed hands, blur, out of focus, long neck, long body, ugly, disgusting, poorly drawn, childish, mutilated, , mangled, old, surreal
Seed: 42
Here are the results:
Image quality looks ok, but none of them is wearing hats! This may be due to the fact that we are always using the same “a photo of xyz” for the prompts, and the model forgets about what “hat” means.
Improving the Results
Using BLIP Captions
In the vanilla case, we used the same prompt for every image instance, which can be not ideal for the model - this is a training / testing mismatch. Because in training we have the same short prompt, but in validation we use very different prompts.
BLIP is a model for generating text from images. We can use this model to automatically generate some informative descriptions and append to the prompts. This could help our model to generalize better.
We are using the diffusers training scripts as a base, and make changes to the DreamboothDataset class so that it can now accept an additional parameter image_captions containing prompt mappings. For accelerate to run in Colab, we made some small changes, but those should be quite straightforward.
PRETRAINED_MODEL ="stabilityai/stable-diffusion-2"#@param ["runwayml/stable-diffusion-v1-5", "CompVis/stable-diffusion-v1-4", "stabilityai/stable-diffusion-2", "stabilityai/stable-diffusion-2-base"] {allow-input: true, type: "string"}STEPS=10000#@param {type: "integer"}SAVE_STEPS=3000#@param {type: "integer"}OUTPUT_DIR="/content/drive/MyDrive/sd/models/asuka_blip_v2"#@param {type: "string"}INSTANCE_DIR="/content/drive/MyDrive/sd/images_training"#@param {type: "string"}INSTANCE_PROMPT="a photo of saitoasuka"#@param {type: "string"}USE_BLIP_CAPTIONS =True#@param {type:"boolean"}LEARNING_RATE=3e-6#@param {type: "number"}from argparse import Namespaceimport osimport numpy as npimport matplotlib.pyplot as pltimport PILimport randomimport pickleargs = Namespace( pretrained_model_name_or_path=PRETRAINED_MODEL, # Path to pretrained model or model identifier from huggingface.co/models. revision=None, # Revision of pretrained model identifier from huggingface.co/models. tokenizer_name=None, # Pretrained tokenizer name or path if not the same as model_name instance_data_dir=INSTANCE_DIR, # A folder containing the training data of instance images. class_data_dir=None, # A folder containing the training data of class images. instance_prompt=INSTANCE_PROMPT, # The prompt with identifier specifying the instance class_prompt=None, # The prompt to specify images in the same class as provided instance images. with_prior_preservation=False, # Flag to add prior preservation loss. prior_loss_weight=1.0, # The weight of prior preservation loss. num_class_images=100, # Minimal class images for prior preservation loss. If not have enough images, additional images will be sampled with class_prompt. output_dir=OUTPUT_DIR, # The output directory where the model predictions and checkpoints will be written. seed=None, # A seed for reproducible training. resolution=768if PRETRAINED_MODEL =="stabilityai/stable-diffusion-2"else512, # The resolution for input images, all the images in the train/validation dataset will be resized to this resolution center_crop=True, # Whether to center crop images before resizing to resolution train_text_encoder=True, # Whether to train the text encoder train_batch_size=1, # Batch size (per device) for the training dataloader. sample_batch_size=4, # Batch size (per device) for sampling images. num_train_epochs=1, max_train_steps=STEPS, # Total number of training steps to perform. If provided, overrides num_train_epochs. save_steps=SAVE_STEPS, # Save checkpoint every X updates steps. gradient_accumulation_steps=1, # Number of updates steps to accumulate before performing a backward/update pass. gradient_checkpointing=True, # Whether or not to use gradient checkpointing to save memory at the expense of slower backward pass. learning_rate=LEARNING_RATE, # Initial learning rate (after the potential warmup period) to use. scale_lr=False, # Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size. lr_scheduler="polynomial", # The scheduler type to use. Choose between ["linear", "cosine", "cosine_with_restarts", "polynomial", "constant", "constant_with_warmup"] lr_warmup_steps=0, # Number of steps for the warmup in the lr scheduler. use_8bit_adam=True, # Whether or not to use 8-bit Adam from bitsandbytes adam_beta1=0.9, adam_beta2=0.999, adam_weight_decay=1e-2, adam_epsilon=1e-8, max_grad_norm=1.0, push_to_hub=False, hub_token=None, hub_model_id=None, logging_dir='logs', mixed_precision=None, # ["no", "fp16", "bf16"] local_rank=-1# For distributed training: local_rank)ifnot os.path.exists('train_dreambooth.py'):!cd /content/!wget https://raw.githubusercontent.com/huggingface/diffusers/main/examples/dreambooth/train_dreambooth.py -O train_dreambooth.pyassert os.path.exists('train_dreambooth.py'), 'Unable to download train_dreambooth.py'import train_dreamboothtrain_dreambooth.args = argsfrom train_dreambooth import*if USE_BLIP_CAPTIONS:withopen('image_captions.pickle', 'rb') as f: image_captions = pickle.load(f)else: image_captions = {}print('Using image captions:', image_captions)class DreamBoothDataset(Dataset):""" A dataset to prepare the instance and class images with the prompts for fine-tuning the model. It pre-processes the images and the tokenizes prompts. """def__init__(self, instance_data_root, instance_prompt, tokenizer, image_captions=None, class_data_root=None, class_prompt=None, size=512, center_crop=False, ):self.size = sizeself.center_crop = center_cropself.tokenizer = tokenizerself.instance_data_root = Path(instance_data_root)ifnotself.instance_data_root.exists():raiseValueError("Instance images root doesn't exists.")self.instance_images_path =list(Path(instance_data_root).iterdir())self.num_instance_images =len(self.instance_images_path)self.instance_prompt = instance_promptself.image_captions = image_captions or {}self._length =self.num_instance_imagesif class_data_root isnotNone:self.class_data_root = Path(class_data_root)self.class_data_root.mkdir(parents=True, exist_ok=True)self.class_images_path =list(self.class_data_root.iterdir())self.num_class_images =len(self.class_images_path)self._length =max(self.num_class_images, self.num_instance_images)self.class_prompt = class_promptelse:self.class_data_root =Noneself.image_transforms = transforms.Compose( [ transforms.Resize(size, interpolation=transforms.InterpolationMode.BILINEAR), transforms.CenterCrop(size) if center_crop else transforms.RandomCrop(size), transforms.ToTensor(), transforms.Normalize([0.5], [0.5]), ] )def__len__(self):returnself._lengthdef__getitem__(self, index): example = {} image_path =self.instance_images_path[index %self.num_instance_images] captions =self.image_captions.get(str(image_path))if captions: caption = random.choice(captions) prompt =f'{self.instance_prompt}, {caption}'else: prompt =self.instance_prompt instance_image = Image.open(image_path)ifnot instance_image.mode =="RGB": instance_image = instance_image.convert("RGB") example["instance_images"] =self.image_transforms(instance_image) example["instance_prompt_ids"] =self.tokenizer( prompt, padding="do_not_pad", truncation=True, max_length=self.tokenizer.model_max_length, ).input_idsifself.class_data_root: class_image = Image.open(self.class_images_path[index %self.num_class_images])ifnot class_image.mode =="RGB": class_image = class_image.convert("RGB") example["class_images"] =self.image_transforms(class_image) example["class_prompt_ids"] =self.tokenizer(self.class_prompt, padding="do_not_pad", truncation=True, max_length=self.tokenizer.model_max_length, ).input_idsreturn exampledef display_example(tokenizer, ex): image = ex['instance_images'] image_norm = ((image+1) /2*255) image_np = image_norm.numpy().astype(np.uint8).transpose((1,2,0)) prompt = tokenizer.decode(ex['instance_prompt_ids']) raw_image = PIL.Image.fromarray(image_np)print(prompt) display(raw_image)def display_training_examples(): tokenizer = AutoTokenizer.from_pretrained( PRETRAINED_MODEL, subfolder="tokenizer", revision=None, use_fast=False, ) train_dataset = DreamBoothDataset( instance_data_root=INSTANCE_DIR, instance_prompt=INSTANCE_PROMPT, image_captions=image_captions, class_data_root=None, class_prompt=None, tokenizer=tokenizer, size=512, center_crop=False, )print("Display a few training examples")for idx, ex inenumerate(train_dataset): display_example(tokenizer, ex)if idx >=3:breakdef main(args): logging_dir = Path(args.output_dir, args.logging_dir) accelerator = Accelerator( gradient_accumulation_steps=args.gradient_accumulation_steps, mixed_precision=args.mixed_precision, log_with="tensorboard", logging_dir=logging_dir, )# Currently, it's not possible to do gradient accumulation when training two models with accelerate.accumulate# This will be enabled soon in accelerate. For now, we don't allow gradient accumulation when training two models.# TODO (patil-suraj): Remove this check when gradient accumulation with two models is enabled in accelerate.if args.train_text_encoder and args.gradient_accumulation_steps >1and accelerator.num_processes >1:raiseValueError("Gradient accumulation is not supported when training the text encoder in distributed training. ""Please set gradient_accumulation_steps to 1. This feature will be supported in the future." )if args.seed isnotNone: set_seed(args.seed)if args.with_prior_preservation: class_images_dir = Path(args.class_data_dir)ifnot class_images_dir.exists(): class_images_dir.mkdir(parents=True) cur_class_images =len(list(class_images_dir.iterdir()))if cur_class_images < args.num_class_images: torch_dtype = torch.float16 if accelerator.device.type=="cuda"else torch.float32 pipeline = DiffusionPipeline.from_pretrained( args.pretrained_model_name_or_path, torch_dtype=torch_dtype, safety_checker=None, revision=args.revision, ) pipeline.set_progress_bar_config(disable=True) num_new_images = args.num_class_images - cur_class_images logger.info(f"Number of class images to sample: {num_new_images}.") sample_dataset = PromptDataset(args.class_prompt, num_new_images) sample_dataloader = torch.utils.data.DataLoader(sample_dataset, batch_size=args.sample_batch_size) sample_dataloader = accelerator.prepare(sample_dataloader) pipeline.to(accelerator.device)for example in tqdm( sample_dataloader, desc="Generating class images", disable=not accelerator.is_local_main_process ): images = pipeline(example["prompt"]).imagesfor i, image inenumerate(images): hash_image = hashlib.sha1(image.tobytes()).hexdigest() image_filename = class_images_dir /f"{example['index'][i] + cur_class_images}-{hash_image}.jpg" image.save(image_filename)del pipelineif torch.cuda.is_available(): torch.cuda.empty_cache()# Handle the repository creationif accelerator.is_main_process:if args.push_to_hub:if args.hub_model_id isNone: repo_name = get_full_repo_name(Path(args.output_dir).name, token=args.hub_token)else: repo_name = args.hub_model_id repo = Repository(args.output_dir, clone_from=repo_name)withopen(os.path.join(args.output_dir, ".gitignore"), "w+") as gitignore:if"step_*"notin gitignore: gitignore.write("step_*\n")if"epoch_*"notin gitignore: gitignore.write("epoch_*\n")elif args.output_dir isnotNone: os.makedirs(args.output_dir, exist_ok=True)# Load the tokenizerif args.tokenizer_name: tokenizer = AutoTokenizer.from_pretrained( args.tokenizer_name, revision=args.revision, use_fast=False, )elif args.pretrained_model_name_or_path: tokenizer = AutoTokenizer.from_pretrained( args.pretrained_model_name_or_path, subfolder="tokenizer", revision=args.revision, use_fast=False, )# import correct text encoder class text_encoder_cls = import_model_class_from_model_name_or_path(args.pretrained_model_name_or_path, args.revision)# Load models and create wrapper for stable diffusion text_encoder = text_encoder_cls.from_pretrained( args.pretrained_model_name_or_path, subfolder="text_encoder", revision=args.revision, ) vae = AutoencoderKL.from_pretrained( args.pretrained_model_name_or_path, subfolder="vae", revision=args.revision, ) unet = UNet2DConditionModel.from_pretrained( args.pretrained_model_name_or_path, subfolder="unet", revision=args.revision, ) vae.requires_grad_(False)ifnot args.train_text_encoder: text_encoder.requires_grad_(False)if args.gradient_checkpointing: unet.enable_gradient_checkpointing()if args.train_text_encoder: text_encoder.gradient_checkpointing_enable()if args.scale_lr: args.learning_rate = ( args.learning_rate * args.gradient_accumulation_steps * args.train_batch_size * accelerator.num_processes )# Use 8-bit Adam for lower memory usage or to fine-tune the model in 16GB GPUsif args.use_8bit_adam:try:import bitsandbytes as bnbexceptImportError:raiseImportError("To use 8-bit Adam, please install the bitsandbytes library: `pip install bitsandbytes`." ) optimizer_class = bnb.optim.AdamW8bitelse: optimizer_class = torch.optim.AdamW params_to_optimize = ( itertools.chain(unet.parameters(), text_encoder.parameters()) if args.train_text_encoder else unet.parameters() ) optimizer = optimizer_class( params_to_optimize, lr=args.learning_rate, betas=(args.adam_beta1, args.adam_beta2), weight_decay=args.adam_weight_decay, eps=args.adam_epsilon, ) noise_scheduler = DDPMScheduler.from_config(args.pretrained_model_name_or_path, subfolder="scheduler")print('Image captions:', image_captions) train_dataset = DreamBoothDataset( image_captions=image_captions, instance_data_root=args.instance_data_dir, instance_prompt=args.instance_prompt, class_data_root=args.class_data_dir if args.with_prior_preservation elseNone, class_prompt=args.class_prompt, tokenizer=tokenizer, size=args.resolution, center_crop=args.center_crop, )def collate_fn(examples): input_ids = [example["instance_prompt_ids"] for example in examples] pixel_values = [example["instance_images"] for example in examples]# Concat class and instance examples for prior preservation.# We do this to avoid doing two forward passes.if args.with_prior_preservation: input_ids += [example["class_prompt_ids"] for example in examples] pixel_values += [example["class_images"] for example in examples] pixel_values = torch.stack(pixel_values) pixel_values = pixel_values.to(memory_format=torch.contiguous_format).float() input_ids = tokenizer.pad( {"input_ids": input_ids}, padding="max_length", max_length=tokenizer.model_max_length, return_tensors="pt", ).input_ids batch = {"input_ids": input_ids,"pixel_values": pixel_values, }return batch train_dataloader = torch.utils.data.DataLoader( train_dataset, batch_size=args.train_batch_size, shuffle=True, collate_fn=collate_fn, num_workers=1 )# Scheduler and math around the number of training steps. overrode_max_train_steps =False num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)if args.max_train_steps isNone: args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch overrode_max_train_steps =True lr_scheduler = get_scheduler( args.lr_scheduler, optimizer=optimizer, num_warmup_steps=args.lr_warmup_steps * args.gradient_accumulation_steps, num_training_steps=args.max_train_steps * args.gradient_accumulation_steps, )if args.train_text_encoder: unet, text_encoder, optimizer, train_dataloader, lr_scheduler = accelerator.prepare( unet, text_encoder, optimizer, train_dataloader, lr_scheduler )else: unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare( unet, optimizer, train_dataloader, lr_scheduler ) weight_dtype = torch.float32if accelerator.mixed_precision =="fp16": weight_dtype = torch.float16elif accelerator.mixed_precision =="bf16": weight_dtype = torch.bfloat16# Move text_encode and vae to gpu.# For mixed precision training we cast the text_encoder and vae weights to half-precision# as these models are only used for inference, keeping weights in full precision is not required. vae.to(accelerator.device, dtype=weight_dtype)ifnot args.train_text_encoder: text_encoder.to(accelerator.device, dtype=weight_dtype)# We need to recalculate our total training steps as the size of the training dataloader may have changed. num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)if overrode_max_train_steps: args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch# Afterwards we recalculate our number of training epochs args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)# We need to initialize the trackers we use, and also store our configuration.# The trackers initializes automatically on the main process.if accelerator.is_main_process: accelerator.init_trackers("dreambooth", config=vars(args))# Train! total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps logger.info("***** Running training *****") logger.info(f" Num examples = {len(train_dataset)}") logger.info(f" Num batches each epoch = {len(train_dataloader)}") logger.info(f" Num Epochs = {args.num_train_epochs}") logger.info(f" Instantaneous batch size per device = {args.train_batch_size}") logger.info(f" Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}") logger.info(f" Gradient Accumulation steps = {args.gradient_accumulation_steps}") logger.info(f" Total optimization steps = {args.max_train_steps}")# Only show the progress bar once on each machine. progress_bar = tqdm(range(args.max_train_steps), disable=not accelerator.is_local_main_process) progress_bar.set_description("Steps") global_step =0for epoch inrange(args.num_train_epochs): unet.train()if args.train_text_encoder: text_encoder.train()for step, batch inenumerate(train_dataloader):with accelerator.accumulate(unet):# Convert images to latent space latents = vae.encode(batch["pixel_values"].to(dtype=weight_dtype)).latent_dist.sample() latents = latents *0.18215# Sample noise that we'll add to the latents noise = torch.randn_like(latents) bsz = latents.shape[0]# Sample a random timestep for each image timesteps = torch.randint(0, noise_scheduler.config.num_train_timesteps, (bsz,), device=latents.device) timesteps = timesteps.long()# Add noise to the latents according to the noise magnitude at each timestep# (this is the forward diffusion process) noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)# Get the text embedding for conditioning encoder_hidden_states = text_encoder(batch["input_ids"])[0]# Predict the noise residual model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample# Get the target for loss depending on the prediction typeif noise_scheduler.config.prediction_type =="epsilon": target = noiseelif noise_scheduler.config.prediction_type =="v_prediction": target = noise_scheduler.get_velocity(latents, noise, timesteps)else:raiseValueError(f"Unknown prediction type {noise_scheduler.config.prediction_type}")if args.with_prior_preservation:# Chunk the noise and model_pred into two parts and compute the loss on each part separately. model_pred, model_pred_prior = torch.chunk(model_pred, 2, dim=0) target, target_prior = torch.chunk(target, 2, dim=0)# Compute instance loss loss = F.mse_loss(model_pred.float(), target.float(), reduction="none").mean([1, 2, 3]).mean()# Compute prior loss prior_loss = F.mse_loss(model_pred_prior.float(), target_prior.float(), reduction="mean")# Add the prior loss to the instance loss. loss = loss + args.prior_loss_weight * prior_losselse: loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean") accelerator.backward(loss)if accelerator.sync_gradients: params_to_clip = ( itertools.chain(unet.parameters(), text_encoder.parameters())if args.train_text_encoderelse unet.parameters() ) accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm) optimizer.step() lr_scheduler.step() optimizer.zero_grad()# Checks if the accelerator has performed an optimization step behind the scenesif accelerator.sync_gradients: progress_bar.update(1) global_step +=1if global_step % args.save_steps ==0:if accelerator.is_main_process: pipeline = DiffusionPipeline.from_pretrained( args.pretrained_model_name_or_path, unet=accelerator.unwrap_model(unet), text_encoder=accelerator.unwrap_model(text_encoder), revision=args.revision, ) save_path = os.path.join(args.output_dir, f"checkpoint-{global_step}") pipeline.save_pretrained(save_path) logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0]} progress_bar.set_postfix(**logs) accelerator.log(logs, step=global_step)if global_step >= args.max_train_steps:break accelerator.wait_for_everyone()# Create the pipeline using using the trained modules and save it.if accelerator.is_main_process: pipeline = DiffusionPipeline.from_pretrained( args.pretrained_model_name_or_path, unet=accelerator.unwrap_model(unet), text_encoder=accelerator.unwrap_model(text_encoder), revision=args.revision, ) pipeline.save_pretrained(args.output_dir)if args.push_to_hub: repo.push_to_hub(commit_message="End of training", blocking=False, auto_lfs_prune=True) accelerator.end_training()# display_training_examples()import accelerateaccelerate.notebook_launcher(main, args=(args,))
/usr/local/lib/python3.8/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
WARNING:root:WARNING: /usr/local/lib/python3.8/dist-packages/xformers/_C.so: undefined symbol: _ZNK3c104impl13OperatorEntry20reportSignatureErrorENS0_12CppSignatureE
Need to compile C++ extensions to get sparse attention suport. Please run python setup.py build develop
/usr/local/lib/python3.8/dist-packages/xformers/_C.so: undefined symbol: _ZNK3c104impl13OperatorEntry20reportSignatureErrorENS0_12CppSignatureE
Using image captions: {'/content/drive/MyDrive/sd/images_training/be2c57cd-cc69-47f2-acbb-3cdb93a81b28.jpg': ['an asian woman in a white shirt posing for a picture', 'the woman is posing for the camera to show off her blue eyes', 'the young girl is posing for a picture', 'an asian woman with long hair, wearing a white shirt and denim overall', 'the young woman has long hair wearing white blouse'], '/content/drive/MyDrive/sd/images_training/335e70b8-85f8-4c7d-b4a5-c60428aef025.jpg': ['a woman wearing a black dress is leaning on a bed', 'a young girl laying on a bed while holding a controller', 'a woman in black and white dress holding her hand to a pole', 'an asian girl holding onto the arm of a white metal iron', 'a young woman posing with her hand on the side of a hand rail'], '/content/drive/MyDrive/sd/images_training/8a0205b5-7926-4753-8237-c40bb2838086.jpg': ['a pretty girl leaning on a wall looking to her left', 'a girl in blue shirt leaning against a yellow pole', 'a girl is staring out from a bus', 'a woman wearing a dress near a yellow wall', 'a young woman with bangs and eyes looking straight ahead'], '/content/drive/MyDrive/sd/images_training/0f369a7f-e18e-476a-a98a-f8ae901864ed.jpg': ['a girl with a leaf on her head', 'the young girl is posing with the bird on her head', 'a young girl posing in a wooded area holding her bird on the head', 'a little girl with a leaf on her head', 'a woman with a striped shirt and a green leaf on top of her head'], '/content/drive/MyDrive/sd/images_training/9d1ca46d-2071-474a-b8f5-7150dd9452da.jpg': ['an asian girl with blue eyes stares into the camera', 'a girl with very long hair and bangs', 'a girl in school uniform poses for a portrait', 'a close up of a person with a tie on', 'a girl with a tie is staring at the camera'], '/content/drive/MyDrive/sd/images_training/e4b74d9e-77ae-4161-8a1c-683623f4e963.jpg': ['a woman standing up against a purple background', 'the asian woman is posing for the photo', 'an asian woman poses for a portrait against a purple background', 'a young lady in black and white poses on a purple wall', 'an asian female in a black and white dress'], '/content/drive/MyDrive/sd/images_training/bf012971-ebb3-4007-a0ce-49a8a565b93a.jpg': ['woman holding wii controller in front of a camera', 'a woman with black hair, wearing white shirt', 'a woman is leaning on a rail holding a remote control', 'an attractive woman staring at the camera', 'the young asian girl is wearing a white t - shirt'], '/content/drive/MyDrive/sd/images_training/ffee321a-a9cb-4c05-a62e-59e832560f16.jpg': ['a girl with a messy ponytail standing against the wall', 'a beautiful young asian woman posing for a picture', 'the young woman has her eyes wide open and she is in white dress with flowers', 'a young asian woman in white is posing', 'a young asian girl in a white dress posing'], '/content/drive/MyDrive/sd/images_training/17ea5ae0-0e22-4945-8b42-f2a89dd90876.jpg': ['a girl with an oversized scarf is sitting on a bench', 'a woman sitting down and posing for the camera', 'a young woman is sitting down with a scarf', 'a beautiful asian woman wearing a grey scarf and red checkered pants', 'an asian woman sitting on a bench wearing a white scarf'], '/content/drive/MyDrive/sd/images_training/fcdf765b-72d4-4ffe-8305-7d9e48033fb1.jpg': ['a woman sitting at a table with her hands on her chin', 'a woman with long hair sitting with her chin resting on the table', 'a girl is sitting by the table and thinking', 'young girl wearing a white blouse with long hair', 'a woman with black hair and a long sleeved white shirt'], '/content/drive/MyDrive/sd/images_training/7c784480-cdfa-4fb1-88dc-cebefa1be0de.jpg': ['a beautiful young asian woman wearing a leopard print sweater', 'young girl with long straight hair leaning against white wall', 'a woman leaning against a white wall next to a wall', 'a woman leans against a wall with a leopard sweater on', 'a very pretty asian woman leaning against a wall'], '/content/drive/MyDrive/sd/images_training/019b20f8-cb84-4ef7-b4d9-b298856f4c81.jpg': ['a young girl sits in an airplane wearing a green hoodie', 'an asian girl in green jacket sitting at table with straw basket and baskets', 'a small child standing on a ledge above a valley', 'a young girl sitting on top of a chair', 'a little girl looks over at the camera'], '/content/drive/MyDrive/sd/images_training/0265b88b-e278-46bd-93db-5286efffb357.jpg': ['a woman in a santa hat and dress smiling', 'there is a girl that is wearing a hat on her head', 'a young woman in santa hat smiles for the camera', 'asian girl wearing a santa claus hat, smiling for the camera', 'asian woman wearing santa hat with holiday lights behind her'], '/content/drive/MyDrive/sd/images_training/15060cb6-b9f9-4099-89ed-52933b992e28.jpg': ['a young woman in white shirt holding a purple object', 'a woman that is standing next to a wooden table', 'young woman with long brown hair and a white shirt posing in front of dark wall', 'a young woman is wearing purple and white clothes', 'a girl with long brown hair wearing a white blouse'], '/content/drive/MyDrive/sd/images_training/d9c8300c-dca4-49f2-b38f-df09cf079100.jpg': ['a beautiful asian girl with dark hair posing for a picture', 'a woman sitting by a window wearing a pink coat', 'a close up of a person wearing a pink jacket and gloves', 'a girl with long black hair posing for the camera', 'a beautiful young asian woman sitting on a bench'], '/content/drive/MyDrive/sd/images_training/1e9388c5-067a-4dc9-84f1-29d05cb576ea.jpg': ['asian girl in pink dress brushing teeth with toothbrush', 'young woman with pink top brushing her teeth', 'a young woman is brushing her teeth in front of a mirror', 'a woman in a pink shirt is brushing her teeth', 'a young woman brushing her teeth in front of a mirror'], '/content/drive/MyDrive/sd/images_training/1287ea6d-83cb-44a4-8093-521580bd5271.jpg': ['woman with long dark hair posing for photo', 'a girl that has long brown hair', 'a beautiful woman with long brown hair sitting down', 'a woman looking to the right', 'woman with long black hair and grey shirt posing'], '/content/drive/MyDrive/sd/images_training/38bb0e3d-124b-470a-8727-b7a0c2922289.jpg': ['a girl brushes her teeth outside, with her hand', 'a close up of a person holding a brush in her mouth', 'the girl is holding a toothbrush in her mouth', 'the girl is eating a straw in the field', 'a girl with a spoon standing by herself'], '/content/drive/MyDrive/sd/images_training/763535ce-0c8b-4219-8fac-1fb153559a0b.jpg': ['a person wearing shorts sitting on a bed next to a window', 'an asian woman wearing blue and yellow looking at the camera', 'a very cute girl sitting in front of a window', 'a close up of a person in a dress near a window', 'a beautiful woman sitting in front of a window'], '/content/drive/MyDrive/sd/images_training/39ed29af-0c2e-4e12-9251-5938abddede3.jpg': ['a person with brown hair covering their face behind a fence', 'a close up of a woman holding her hands over her mouth', 'the woman has a hooded sweatshirt around her neck', 'a girl hiding her mouth under a blue jacket', 'there is a woman that is covering her face with a jacket'], '/content/drive/MyDrive/sd/images_training/7040c264-3016-4fab-9073-a1ea3b402867.png': ['a woman with long black hair looking at the camera', 'the girl in white shirt has her hair long', 'an asian woman is standing in front of a car', 'young japanese woman with long brown hair near a car', 'a young girl with long hair and bangs'], '/content/drive/MyDrive/sd/images_training/5b683650-3d6e-4b0b-bce7-928c09733ad8.jpg': ['the woman is posing for a photo while sitting', 'an asian woman sitting on a white couch', 'a young woman laying on top of a bed next to a pillow', 'asian girl with brown hair and bangs laying on the floor with hands clasped up', 'the girl in the white shirt is sitting by her bed'], '/content/drive/MyDrive/sd/images_training/a562af99-a96b-4425-adf1-0d8215c0df4d.jpg': ['a girl in a knitted hat and shirt', 'a woman wearing a grey beanie holding her hands under her hair', 'a young woman wearing a hat and staring directly', 'asian woman wearing a grey beanie with black trim', 'woman in white shirt putting on hat with hand'], '/content/drive/MyDrive/sd/images_training/be2fd2fd-cb1f-4415-8aa1-85fc17cebbc9.jpg': ['a girl with long dark hair laying on top of a sofa', 'the woman has dark hair and bangs on her head', 'a young asian lady with dark hair', 'the asian woman is wearing an oversized vest', 'woman holding up a blue cloth and wearing a gray shirt'], '/content/drive/MyDrive/sd/images_training/3133ea61-cf12-401b-98cb-9378d53dbb80.jpg': ['a beautiful young asian woman standing next to a wall', 'there is a young asian woman posing for the camera', 'asian girl standing by a white wall', 'a young asian lady is standing near a wall', 'a young woman leaning against a wall and looking off into the distance'], '/content/drive/MyDrive/sd/images_training/5ade1dbb-9ed5-49a7-9d1b-e3be262bd820.jpg': ['a woman that is wearing some kind of white shirt', 'a girl with a dark hair poses on the beach', 'woman with black hair in front of an ocean', 'a woman with long hair is staring away', 'young woman wearing t - shirt looking away from camera'], '/content/drive/MyDrive/sd/images_training/fa17157a-280a-4d7e-a8d8-bb7b115e5575.jpg': ['a young woman has long brown hair and bangs', 'a very pretty woman with a neat look', 'the woman with a long hair wearing a white shirt', 'the head of a woman with long brown hair', 'a young lady with long hair and big bangs'], '/content/drive/MyDrive/sd/images_training/9e967a0e-9cad-40a1-8791-57baca670e13.jpg': ['a pretty young woman holding a teddy bear', 'a woman holding a teddy bear while smiling', 'a beautiful young lady holding a stuffed teddy bear', 'asian woman in white dress holding a cream colored teddy bear', 'young woman holding a teddy bear while posing for picture'], '/content/drive/MyDrive/sd/images_training/3a1cc96c-2cb2-4430-b661-74650d8b351f.jpg': ['a girl poses in the grass near an oat plant', 'the girl is staring at the camera', 'young girl in blue shirt with dark hair and fringes', 'a young asian girl with long brown hair', 'a young woman in blue shirt posing for a picture']}
Launching training on one GPU.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Image captions: {'/content/drive/MyDrive/sd/images_training/be2c57cd-cc69-47f2-acbb-3cdb93a81b28.jpg': ['an asian woman in a white shirt posing for a picture', 'the woman is posing for the camera to show off her blue eyes', 'the young girl is posing for a picture', 'an asian woman with long hair, wearing a white shirt and denim overall', 'the young woman has long hair wearing white blouse'], '/content/drive/MyDrive/sd/images_training/335e70b8-85f8-4c7d-b4a5-c60428aef025.jpg': ['a woman wearing a black dress is leaning on a bed', 'a young girl laying on a bed while holding a controller', 'a woman in black and white dress holding her hand to a pole', 'an asian girl holding onto the arm of a white metal iron', 'a young woman posing with her hand on the side of a hand rail'], '/content/drive/MyDrive/sd/images_training/8a0205b5-7926-4753-8237-c40bb2838086.jpg': ['a pretty girl leaning on a wall looking to her left', 'a girl in blue shirt leaning against a yellow pole', 'a girl is staring out from a bus', 'a woman wearing a dress near a yellow wall', 'a young woman with bangs and eyes looking straight ahead'], '/content/drive/MyDrive/sd/images_training/0f369a7f-e18e-476a-a98a-f8ae901864ed.jpg': ['a girl with a leaf on her head', 'the young girl is posing with the bird on her head', 'a young girl posing in a wooded area holding her bird on the head', 'a little girl with a leaf on her head', 'a woman with a striped shirt and a green leaf on top of her head'], '/content/drive/MyDrive/sd/images_training/9d1ca46d-2071-474a-b8f5-7150dd9452da.jpg': ['an asian girl with blue eyes stares into the camera', 'a girl with very long hair and bangs', 'a girl in school uniform poses for a portrait', 'a close up of a person with a tie on', 'a girl with a tie is staring at the camera'], '/content/drive/MyDrive/sd/images_training/e4b74d9e-77ae-4161-8a1c-683623f4e963.jpg': ['a woman standing up against a purple background', 'the asian woman is posing for the photo', 'an asian woman poses for a portrait against a purple background', 'a young lady in black and white poses on a purple wall', 'an asian female in a black and white dress'], '/content/drive/MyDrive/sd/images_training/bf012971-ebb3-4007-a0ce-49a8a565b93a.jpg': ['woman holding wii controller in front of a camera', 'a woman with black hair, wearing white shirt', 'a woman is leaning on a rail holding a remote control', 'an attractive woman staring at the camera', 'the young asian girl is wearing a white t - shirt'], '/content/drive/MyDrive/sd/images_training/ffee321a-a9cb-4c05-a62e-59e832560f16.jpg': ['a girl with a messy ponytail standing against the wall', 'a beautiful young asian woman posing for a picture', 'the young woman has her eyes wide open and she is in white dress with flowers', 'a young asian woman in white is posing', 'a young asian girl in a white dress posing'], '/content/drive/MyDrive/sd/images_training/17ea5ae0-0e22-4945-8b42-f2a89dd90876.jpg': ['a girl with an oversized scarf is sitting on a bench', 'a woman sitting down and posing for the camera', 'a young woman is sitting down with a scarf', 'a beautiful asian woman wearing a grey scarf and red checkered pants', 'an asian woman sitting on a bench wearing a white scarf'], '/content/drive/MyDrive/sd/images_training/fcdf765b-72d4-4ffe-8305-7d9e48033fb1.jpg': ['a woman sitting at a table with her hands on her chin', 'a woman with long hair sitting with her chin resting on the table', 'a girl is sitting by the table and thinking', 'young girl wearing a white blouse with long hair', 'a woman with black hair and a long sleeved white shirt'], '/content/drive/MyDrive/sd/images_training/7c784480-cdfa-4fb1-88dc-cebefa1be0de.jpg': ['a beautiful young asian woman wearing a leopard print sweater', 'young girl with long straight hair leaning against white wall', 'a woman leaning against a white wall next to a wall', 'a woman leans against a wall with a leopard sweater on', 'a very pretty asian woman leaning against a wall'], '/content/drive/MyDrive/sd/images_training/019b20f8-cb84-4ef7-b4d9-b298856f4c81.jpg': ['a young girl sits in an airplane wearing a green hoodie', 'an asian girl in green jacket sitting at table with straw basket and baskets', 'a small child standing on a ledge above a valley', 'a young girl sitting on top of a chair', 'a little girl looks over at the camera'], '/content/drive/MyDrive/sd/images_training/0265b88b-e278-46bd-93db-5286efffb357.jpg': ['a woman in a santa hat and dress smiling', 'there is a girl that is wearing a hat on her head', 'a young woman in santa hat smiles for the camera', 'asian girl wearing a santa claus hat, smiling for the camera', 'asian woman wearing santa hat with holiday lights behind her'], '/content/drive/MyDrive/sd/images_training/15060cb6-b9f9-4099-89ed-52933b992e28.jpg': ['a young woman in white shirt holding a purple object', 'a woman that is standing next to a wooden table', 'young woman with long brown hair and a white shirt posing in front of dark wall', 'a young woman is wearing purple and white clothes', 'a girl with long brown hair wearing a white blouse'], '/content/drive/MyDrive/sd/images_training/d9c8300c-dca4-49f2-b38f-df09cf079100.jpg': ['a beautiful asian girl with dark hair posing for a picture', 'a woman sitting by a window wearing a pink coat', 'a close up of a person wearing a pink jacket and gloves', 'a girl with long black hair posing for the camera', 'a beautiful young asian woman sitting on a bench'], '/content/drive/MyDrive/sd/images_training/1e9388c5-067a-4dc9-84f1-29d05cb576ea.jpg': ['asian girl in pink dress brushing teeth with toothbrush', 'young woman with pink top brushing her teeth', 'a young woman is brushing her teeth in front of a mirror', 'a woman in a pink shirt is brushing her teeth', 'a young woman brushing her teeth in front of a mirror'], '/content/drive/MyDrive/sd/images_training/1287ea6d-83cb-44a4-8093-521580bd5271.jpg': ['woman with long dark hair posing for photo', 'a girl that has long brown hair', 'a beautiful woman with long brown hair sitting down', 'a woman looking to the right', 'woman with long black hair and grey shirt posing'], '/content/drive/MyDrive/sd/images_training/38bb0e3d-124b-470a-8727-b7a0c2922289.jpg': ['a girl brushes her teeth outside, with her hand', 'a close up of a person holding a brush in her mouth', 'the girl is holding a toothbrush in her mouth', 'the girl is eating a straw in the field', 'a girl with a spoon standing by herself'], '/content/drive/MyDrive/sd/images_training/763535ce-0c8b-4219-8fac-1fb153559a0b.jpg': ['a person wearing shorts sitting on a bed next to a window', 'an asian woman wearing blue and yellow looking at the camera', 'a very cute girl sitting in front of a window', 'a close up of a person in a dress near a window', 'a beautiful woman sitting in front of a window'], '/content/drive/MyDrive/sd/images_training/39ed29af-0c2e-4e12-9251-5938abddede3.jpg': ['a person with brown hair covering their face behind a fence', 'a close up of a woman holding her hands over her mouth', 'the woman has a hooded sweatshirt around her neck', 'a girl hiding her mouth under a blue jacket', 'there is a woman that is covering her face with a jacket'], '/content/drive/MyDrive/sd/images_training/7040c264-3016-4fab-9073-a1ea3b402867.png': ['a woman with long black hair looking at the camera', 'the girl in white shirt has her hair long', 'an asian woman is standing in front of a car', 'young japanese woman with long brown hair near a car', 'a young girl with long hair and bangs'], '/content/drive/MyDrive/sd/images_training/5b683650-3d6e-4b0b-bce7-928c09733ad8.jpg': ['the woman is posing for a photo while sitting', 'an asian woman sitting on a white couch', 'a young woman laying on top of a bed next to a pillow', 'asian girl with brown hair and bangs laying on the floor with hands clasped up', 'the girl in the white shirt is sitting by her bed'], '/content/drive/MyDrive/sd/images_training/a562af99-a96b-4425-adf1-0d8215c0df4d.jpg': ['a girl in a knitted hat and shirt', 'a woman wearing a grey beanie holding her hands under her hair', 'a young woman wearing a hat and staring directly', 'asian woman wearing a grey beanie with black trim', 'woman in white shirt putting on hat with hand'], '/content/drive/MyDrive/sd/images_training/be2fd2fd-cb1f-4415-8aa1-85fc17cebbc9.jpg': ['a girl with long dark hair laying on top of a sofa', 'the woman has dark hair and bangs on her head', 'a young asian lady with dark hair', 'the asian woman is wearing an oversized vest', 'woman holding up a blue cloth and wearing a gray shirt'], '/content/drive/MyDrive/sd/images_training/3133ea61-cf12-401b-98cb-9378d53dbb80.jpg': ['a beautiful young asian woman standing next to a wall', 'there is a young asian woman posing for the camera', 'asian girl standing by a white wall', 'a young asian lady is standing near a wall', 'a young woman leaning against a wall and looking off into the distance'], '/content/drive/MyDrive/sd/images_training/5ade1dbb-9ed5-49a7-9d1b-e3be262bd820.jpg': ['a woman that is wearing some kind of white shirt', 'a girl with a dark hair poses on the beach', 'woman with black hair in front of an ocean', 'a woman with long hair is staring away', 'young woman wearing t - shirt looking away from camera'], '/content/drive/MyDrive/sd/images_training/fa17157a-280a-4d7e-a8d8-bb7b115e5575.jpg': ['a young woman has long brown hair and bangs', 'a very pretty woman with a neat look', 'the woman with a long hair wearing a white shirt', 'the head of a woman with long brown hair', 'a young lady with long hair and big bangs'], '/content/drive/MyDrive/sd/images_training/9e967a0e-9cad-40a1-8791-57baca670e13.jpg': ['a pretty young woman holding a teddy bear', 'a woman holding a teddy bear while smiling', 'a beautiful young lady holding a stuffed teddy bear', 'asian woman in white dress holding a cream colored teddy bear', 'young woman holding a teddy bear while posing for picture'], '/content/drive/MyDrive/sd/images_training/3a1cc96c-2cb2-4430-b661-74650d8b351f.jpg': ['a girl poses in the grass near an oat plant', 'the girl is staring at the camera', 'young girl in blue shirt with dark hair and fringes', 'a young asian girl with long brown hair', 'a young woman in blue shirt posing for a picture']}
Here are the results using same settings:
You can see from the above result that the model is now respecting the “wearing a hat” prompt.
Appendix
[Optional] xformers
xformers allows memory efficient attention to save GPU memory. If not working, maybe run pip install -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers (takes a lot of time!)